Nonlinear germanium-silicon photodiode for activation and monitoring in photonic neuromorphic networks

Silicon photonics is promising for artificial neural networks computing owing to its superior interconnect bandwidth, low energy consumption and scalable fabrication. However, the lack of silicon-integrated and monitorable optical neurons limits its revolution in large-scale artificial neural networks. Here, we highlight nonlinear germanium-silicon photodiodes to construct on-chip optical neurons and a self-monitored all-optical neural network. With specifically engineered optical-to-optical and optical-to-electrical responses, the proposed neuron merges the all-optical activation and non-intrusive monitoring functions in a compact footprint of 4.3 × 8 μm2. Experimentally, a scalable three-layer photonic neural network enables in situ training and learning in object classification and semantic segmentation tasks. The performance of this neuron implemented in a deep-scale neural network is further confirmed via handwriting recognition, achieving a high accuracy of 97.3%. We believe this work will enable future large-scale photonic intelligent processors with more functionalities but simplified architecture.


Supplementary Note 2. Analytical coupled equation
Assuming that the Ge absorber is one-dimensional and there is no electric field, the interaction process of intrinsic absorption, the two-photon absorption and the free-carrier absorption (FCA) can be described by the nonlinear Schrödinger equation (Supplementary Equation (1)) and the carrier rate equation where I(z) and N(z, t) are optical intensity and carrier concentration, respectively, with , ,  and  being intrinsic absorption coefficient, two-photon coefficient, free-carrier cross-section and carrier Solves Supplementary Equations (2-3) for z to obtain: Substituting Supplementary Equation (4) into (1) to obtain: According to the definition formula of optical intensity I=P/S (where P is the optical power and S is the incident area), the solution can be finally expressed as:

Supplementary Note 3. Activation function
In Supplementary Fig. 3a, we show the partial electrode structure that the proposed device used. The device consumes a portion of optical power to generate the carriers through the intrinsic absorption in Ge. A key point is whether the carriers can transit to the electrode within the carrier lifetime  s (the time for carrier recombination). This determines whether or not the carriers gather and produce the FCA effect.
In the electrodeless region (with a weak electric field and carrier transit time  t >>  s ), carriers accumulate and enable the FCA effect. In the region with the electrode (with a strong electric field and  t <<  s ), the carriers are rapidly collected by the electrode, and no FCA effect occurs. Fortunately, these collected carriers can be used for optical monitoring. For the conventional photodiode ( Supplementary Fig. 3b), the electric field is distributed in the entire Ge region, and it will facilitate the transport of carriers, thereby suppressing the carrier accumulation and optical nonlinearity.
Along the light propagation direction (z-axis) of the partial electrode structure, the electric field is: where V b is the built-in voltage of the photodiode, and h Ge , L Ge and L E are the Ge height, Ge length and electrode length, respectively. In the absence of external bias, the electric field is derived from the built-in voltage that can be expressed as: where N D , N A and n i are the doping concentrations of the N++ Ge, P+ Si and i-Ge region, respectively.
They determine the built-in electric field and subsequently engineer the carrier transport. k, T and q are Boltzmann constant, temperature and electron charge, respectively.
The carrier transit time can be expressed as: Here, τ t =50 fs<<τ s (~1 ns) in the electrode region (E=10 kV cm -1 , h Ge =0.5 m, μ=3900 cm 2 V -1 s -1 for calculation). According to Supplementary Equation (4), the final carrier concentration distribution is expressed as: Therefore, in the electrodeless region, carriers accumulate and enable the optical nonlinear effect. In the region with the electrode, the carriers are rapidly collected by the electrode, and only the intrinsic absorption occurs. The strength of the nonlinearity can be evaluated by the average carrier concentration.
Finally, for the proposed partly electrode structure, the nonlinear activation function can be extracted from (according to Supplementary Equation (8)):

Supplementary Note 4. Monitoring photocurrent
The photocurrent originates only from the intrinsic absorption, and the FCA does not contribute to it.
Under low optical power incidence, the photocurrent is proportional to the input optical power, expressed as: where η cou and η c are the optical coupling efficiency from the Si waveguide to the Ge film and the carrier collection efficiency, respectively. λ and c are the wavelength and the speed of light, respectively. Among these factors, η c is quite different with/without electrodes. According to supplementary reference 1 , for E=0 and 10 kV cm -1 , it is: Then, the photocurrent of our device can be expressed as: The average carrier collection efficiency can be used to evaluate the optical-electrical response and it is defined as: When the input optical power is large enough, the photocurrent gradually tends to saturate due to the space charge screening effect 2 . At this point, the photocurrent can be expressed as: where R and I max are responsivity at low-power level and saturation current, respectively. k is a parameter used to change the shape of the curve. 1550 nm, the simulated loss is 5.7 dB (using the 8-m Ge length in our device), and it accords well with the experimental result. Obviously, the optical loss can be further decreased by increasing the operating wavelength or reducing Ge length. For example, with an 8-m device operating at 1700 nm or a 4-m device operating at 1550 nm, the optical loss will be reduced to be 3 dB. Reasonably, the responsivity of monitoring will be reduced to some extent.

Supplementary Note 7. Influence of doping
As shown in Supplementary Fig. 1b, N++ Ge and P+ Si doping contribute to the formation of the n-i-p photodiode. They introduce the built-in electric field that facilitates the transport of carriers, thereby suppressing the carrier accumulation and optical nonlinearity. Therefore, we are able to achieve strong nonlinearity utilizing the partial electrodes and corresponding partial doping region. We design the electrodes and doping in the outer region where the optical field is weak and the photo-generated carrier concentration is low. The doping itself affects weakly on the optical nonlinearity but more remarkably on the speed of the optical-electrical response (the bandwidth of the photodiode). A higher concentration (denoted as N p ) will reduce the series resistance of the photodiode (R s in Supplementary Fig. 5c), and thus the bandwidth will be improved. The simulated bandwidth versus N p is shown as the gray line in Supplementary Fig. 6a. The other circuit components are exacted from the measured S 21 response. When N p exceeds 1×10 19 cm -3 , the bandwidth reaches a maximum value of 50 GHz. However, when the doping concentration is high enough, the bandwidth will not increase indefinitely, as it will then be limited by other factors (such as carrier transit time). In contrast, the FCA in the P+ Si region is significantly increased, as shown in the red line of Supplementary Fig. 6a. Although the Si FCA also leads to optical nonlinearity, the light it consumed does not contribute to optical monitoring, causing additional optical loss instead. Compromisingly, the doping concentration is selected as 1×10 19 cm -3 .
On the other hand, the doping depth of N++ Ge changes the depletion region depth (Ge thicknessdoping depth) and thus engineers the electric field and junction capacitance (C j ), affecting the bandwidth as well. The simulated bandwidth verse doping depth is shown in Supplementary Fig. 6b,  The LF is defined as the cross-entropy between forward propagation result h and the pre-labeled classes y: where y is pre-known and h needs to be measured. Assuming that the activation function of the AONU is P out =f(P in ) and the optical-electrical response is I out =g(P in ), the relationship between the output optical power and the photocurrent can be obtained as P out =f[g -1 (I out )]. Here, h is the P out obtained from the last layer and should be deduced from the photocurrent.
We use an in-situ gradient measurement method to directly obtain the gradient of each distinct weight parameter. It is well known that the gradient for a particular weight parameter ΔW ij can be response (P in >15 mW), the photocurrent remains almost constant (or with a small slope), and the optical power and the gradient cannot be resolved with a high resolution. As a result, the training process is terminated due to photocurrent saturation. In our implementation, we control the optical power falling in the nonlinear part of the neuron (less than 10 mW), and the photocurrent nonlinearity does not affect the training.

Supplementary Note 9. Non-intrusive/intrusive monitoring
We compare the performance and stability of neural networks using the proposed non-intrusive and and this means that the system performance (accuracy) is unstable when the neural network is working.
Furthermore, it can be found that the iterations for intrusive monitoring increases with the degree of perturbation to achieve an optimal accuracy, resulting in an increased training cost. The computing speed is defined as the number of operations per second (FLOPS). Assuming our system has N nodes, m layers and each layer contains an N × N weighting matrix, the system will fulfill In principle, such a computing speed is one order of magnitude faster than electronic neural networks  Then, we estimate the power consumption of the processor. We assume the propagation loss at the circuit level is negligible because the MZI unitary matrix mesh, in principle, is lossless. Then, the power consumption mainly originates from the electrical power required to control the MZI mesh and the optical power to support the optical nonlinearities. Assuming our system has N nodes and each layer contains an N × N weighting matrix, the total number of MZI is N(N-1)/2+N+N(N-1)/2=N 2 . The measured average power consumption for 2π phase shift per MZI is ~10 mW. Therefore, the power consumption of the MZI mesh is 10mN 2 mW. On the other hand, according to Fig. 2e in the main text, the power consumption of each AONU is 5~10 mW to excite the optical nonlinearity. Using the maximum value of 10 mW, the total power consumption is P 10 The energy required for computing scales with the computing speed, and the computing performance is generally evaluated by the energy consumed per operation 7 . Therefore, the energy is expressed as: In our device, P/FLOPS=0.27 pJ per operation. This power consumption is better than an "ideal" electronic computer (1 pJ per operation, assuming no energy is used on data movement) and two orders of magnitude better than conventional GPUs (100 pJ per operation) 8 . It should be pointed out that, in current configuration, the power is mainly used to maintain the working state of the MZIs, rather than optical nonlinearity. If the MZI could be set with nonvolatile phase-change materials, which would in principle require no power for maintaining, the P/FLOPS will be as low as 250/mN 2 fJ per operation.